AITopics | conv 3 3

Collaborating Authors

conv 3 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Spiking Vision Transformer with Saccadic Attention

Wang, Shuai, Zhang, Malu, Zhang, Dehao, Belatreche, Ammar, Xiao, Yichen, Liang, Yu, Shan, Yimeng, Sun, Qian, Zhang, Enqi, Yang, Yang

arXiv.org Artificial IntelligenceFeb-18-2025

The combination of Spiking Neural Networks (SNNs) and Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.12677

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model

Zhang, Kexin, Li, Lixin, Lin, Wensheng, Yan, Yuna, Cheng, Wenchi, Han, Zhu

arXiv.org Artificial IntelligenceOct-2-2024

Semantic Communication (SC) is an emerging technology that has attracted much attention in the sixth-generation (6G) mobile communication systems. However, few literature has fully considered the perceptual quality of the reconstructed image. To solve this problem, we propose a generative SC for wireless image transmission (denoted as SC-CDM). This approach leverages compact diffusion models to improve the fidelity and semantic accuracy of the images reconstructed after transmission, ensuring that the essential content is preserved even in bandwidth-constrained environments. Specifically, we aim to redesign the swin Transformer as a new backbone for efficient semantic feature extraction and compression. Next, the receiver integrates the slim prior and image reconstruction networks. Compared to traditional Diffusion Models (DMs), it leverages DMs' robust distribution mapping capability to generate a compact condition vector, guiding image recovery, thus enhancing the perceptual details of the reconstructed images. Finally, a series of evaluation and ablation studies are conducted to validate the effectiveness and robustness of the proposed algorithm and further increase the Peak Signal-to-Noise Ratio (PSNR) by over 17% on top of CNN-based DeepJSCC.

communication, diffusion model, semantic information, (11 more...)

arXiv.org Artificial Intelligence

2410.02121

Country:

North America > United States > Rhode Island (0.04)
Europe > Greece (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Analyzing and Improving the Training Dynamics of Diffusion Models

Karras, Tero, Aittala, Miika, Lehtinen, Jaakko, Hellsten, Janne, Aila, Timo, Laine, Samuli

arXiv.org Machine LearningDec-5-2023

Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations and weights over the course of training, we redesign the network layers to preserve activation, weight, and update magnitudes on expectation. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity. Our modifications improve the previous record FID of 2.41 in ImageNet-512 synthesis to 1.81, achieved using fast deterministic sampling. As an independent contribution, we present a method for setting the exponential moving average (EMA) parameters post-hoc, i.e., after completing the training run. This allows precise tuning of EMA length without the cost of performing several training runs, and reveals its surprising interactions with network architecture, training time, and guidance.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2312.02696

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis

Vakanski, Aleksandar, Xian, Min

arXiv.org Artificial IntelligenceJul-19-2023

The generalization performance of deep learning models for medical image analysis often decreases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness in deep learning. Although significant research efforts have been recently directed toward establishing generalization bounds and complexity measures, still, there is often a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of supervised deep learning classifiers for breast ultrasound images. The results indicate that PAC-Bayes flatness-based and path norm-based measures produce the most consistent explanation for the combination of models and data. We also investigate the use of multi-task classification and segmentation approach for breast images, and report that such learning approach acts as an implicit regularizer and is conducive toward improved generalization.

artificial intelligence, complexity measure, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/MLSP52302.2021.9596501

2103.03328

Country:

North America > United States > Idaho > Bonneville County > Idaho Falls (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.85)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Interpreting Spatially Infinite Generative Models

Lu, Chaochao, Turner, Richard E., Li, Yingzhen, Kushman, Nate

arXiv.org Machine LearningJul-24-2020

Traditional deep generative models of images and other spatial modalities can only generate fixed sized outputs. The generated images have exactly the same resolution as the training images, which is dictated by the number of layers in the underlying neural network. Recent work has shown, however, that feeding spatial noise vectors into a fully convolutional neural network enables both generation of arbitrary resolution output images as well as training on arbitrary resolution training images. While this work has provided impressive empirical results, little theoretical interpretation was provided to explain the underlying generative process. In this paper we provide a firm theoretical interpretation for infinite spatial generation, by drawing connections to spatial stochastic processes. We use the resulting intuition to improve upon existing spatially infinite generative models to enable more efficient training through a model that we call an infinite generative adversarial network, or $\infty$-GAN. Experiments on world map generation, panoramic images and texture synthesis verify the ability of $\infty$-GAN to efficiently generate images of arbitrary size.

artificial intelligence, machine learning, stochastic process, (17 more...)

arXiv.org Machine Learning

2007.12411

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs

Zhou, Weimin, Bhadra, Sayantan, Brooks, Frank J., Li, Hua, Anastasio, Mark A.

arXiv.org Machine LearningMay-29-2020

It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs). A SOM is a generative model that can be employed to establish an ensemble of to-be-imaged objects with prescribed statistical properties. In order to accurately model variations in anatomical structures and object textures, it is desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system. Deep generative neural networks, such as generative adversarial networks (GANs) hold great potential for this task. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures, such as progressive growing, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a new Progressive Growing AmbientGAN (ProAmGAN) strategy is developed for establishing SOMs from medical imaging measurements. Stylized numerical studies corresponding to common medical imaging modalities are conducted to demonstrate and validate the proposed method for establishing SOMs.

artificial intelligence, conv 3 3, machine learning, (19 more...)

arXiv.org Machine Learning

2006.00033

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

Lin, Qingjian, Cai, Weicheng, Yang, Lin, Wang, Junjie, Zhang, Jun, Li, Ming

arXiv.org Machine LearningFeb-23-2020

In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.

conv 3 3, ieee international conference, signal processing, (14 more...)

arXiv.org Machine Learning

2002.12761

Country:

Asia > China > Jiangsu Province (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Hardware (0.41)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Convolutional Networks with Dense Connectivity

Huang, Gao, Liu, Zhuang, Pleiss, Geoff, van der Maaten, Laurens, Weinberger, Kilian Q.

arXiv.org Machine LearningJan-8-2020

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance.

architecture, densenet, efficiency, (16 more...)

arXiv.org Machine Learning

2001.02394

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition

Mundt, Martin, Majumder, Sagnik, Pliushch, Iuliia, Ramesh, Visvanathan

arXiv.org Machine LearningMay-28-2019

We introduce a unified probabilistic approach for deep continual learning based on variational Bayesian inference with open set recognition. Our model combines a probabilistic encoder with a generative model and a generative linear classifier that get shared across tasks. The open set recognition bounds the approximate posterior by fitting regions of high density on the basis of correctly classified data points and balances open-space risk with recognition errors. Catastrophic inference for both generative models is significantly alleviated through generative replay, where the open set recognition is used to sample from high density areas of the class specific posterior and reject statistical outliers. Our approach naturally allows for forward and backward transfer while maintaining past knowledge without the necessity of storing old data, regularization or inferring task labels. We demonstrate compelling results in the challenging scenario of incrementally expanding the single-head classifier for both class incremental visual and audio classification tasks, as well as incremental learning of datasets across modalities.

artificial intelligence, generative replay, machine learning, (15 more...)

arXiv.org Machine Learning

1905.12019

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Add feedback

Filters

Collaborating Authors

conv 3 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9719a00ed0c5709d80dfef33795dcef3-Supplemental.pdf

Spiking Vision Transformer with Saccadic Attention

SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model

Analyzing and Improving the Training Dynamics of Diffusion Models

Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis

Interpreting Spatially Infinite Generative Models

Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

Convolutional Networks with Dense Connectivity

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition